This is our R Notebook, showing the steps we took to complete the Final Project for CS 329E. This notebook includes step-by-step instructions on how to reproduce our project. To obtain our data, we used data.world.
Below we display our sessionInfo().
sessionInfo(package=NULL)
## R version 3.2.4 Revised (2016-03-16 r70336)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 14393)
##
## locale:
## [1] LC_COLLATE=English_United States.1252
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] backports_1.0.5 magrittr_1.5 rprojroot_1.2 tools_3.2.4
## [5] htmltools_0.3.5 yaml_2.1.14 Rcpp_0.12.10 stringi_1.1.2
## [9] rmarkdown_1.3 knitr_1.15.1 stringr_1.1.0 digest_0.6.12
## [13] evaluate_0.10
The data was found on “Dr. John Rasp’s Statistics Website” (http://www2.stetson.edu/~jrasp/data.htm). It is a subset of the data from College Scorecard, a Department of Education website that gives data on various variables regarding tuition, costs and school performance.
An explanatory key for the recorded variables can be found here: https://data.world/jlee/s-17-dv-final-project/file/CollegeScorecard_ColumnNames.pdf
Here’s our ETL file to clean our data set.
source("../01 Data/R_ETL.CollegeScorecard.R")
## Loading required package: readr
## Warning: package 'readr' was built under R version 3.2.5
## Loading required package: plyr
## Warning: package 'plyr' was built under R version 3.2.5
## Parsed with column specification:
## cols(
## .default = col_character(),
## UNITID = col_integer(),
## CONTROL = col_integer(),
## CCBASIC = col_integer()
## )
## See spec(...) for full column specifications.
## Warning: 421 parsing failures.
## row col expected actual file
## 7283 CCBASIC an integer NULL '../../CSVs/PreETL_CollegeScorecard.csv'
## 7284 CCBASIC an integer NULL '../../CSVs/PreETL_CollegeScorecard.csv'
## 7285 CCBASIC an integer NULL '../../CSVs/PreETL_CollegeScorecard.csv'
## 7286 CCBASIC an integer NULL '../../CSVs/PreETL_CollegeScorecard.csv'
## 7287 CCBASIC an integer NULL '../../CSVs/PreETL_CollegeScorecard.csv'
## .... ....... .......... ...... ........................................
## See problems(...) for more details.
## Classes 'tbl_df', 'tbl' and 'data.frame': 7703 obs. of 30 variables:
## $ UNITID : int 100654 100663 100690 100706 100724 100751 100760 100812 100830 100858 ...
## $ INSTNM : chr "Alabama A & M University" "University of Alabama at Birmingham" "Amridge University" "University of Alabama in Huntsville" ...
## $ CITY : chr "Normal" "Birmingham" "Montgomery" "Huntsville" ...
## $ STABBR : chr "AL" "AL" "AL" "AL" ...
## $ CONTROL : int 1 1 2 1 1 1 1 1 1 1 ...
## $ CCBASIC : int 18 15 20 16 19 16 1 22 18 16 ...
## $ ADM_RATE : chr "0.5256" "0.8569" "NULL" "0.8203" ...
## $ SAT_AVG : chr "827" "1107" "NULL" "1219" ...
## $ UGDS : chr "4206" "11383" "291" "5451" ...
## $ UGDS_WHITE : chr "0.0333" "0.5922" "0.299" "0.6988" ...
## $ UGDS_BLACK : chr "0.9353" "0.26" "0.4192" "0.1255" ...
## $ UGDS_HISP : chr "0.0055" "0.0283" "0.0069" "0.0382" ...
## $ UGDS_ASIAN : chr "0.0019" "0.0518" "0.0034" "0.0376" ...
## $ UGDS_AIAN : chr "0.0024" "0.0022" "0" "0.0143" ...
## $ UGDS_NHPI : chr "0.0019" "0.0007" "0" "0.0002" ...
## $ UGDS_2MOR : chr "0" "0.0368" "0" "0.0172" ...
## $ UGDS_NRA : chr "0.0059" "0.0179" "0" "0.0332" ...
## $ UGDS_UNKN : chr "0.0138" "0.01" "0.2715" "0.035" ...
## $ PPTUG_EF : chr "0.0656" "0.2607" "0.4536" "0.2146" ...
## $ NPT4_PUB : chr "15229" "14789" "NULL" "18596" ...
## $ NPT4_PRIV : chr "NULL" "NULL" "12992" "NULL" ...
## $ COSTT4_A : chr "21475" "20621" "16370" "21107" ...
## $ TUITFTE : chr "9427" "9899" "12459" "8956" ...
## $ INEXPFTE : chr "7437" "17920" "5532" "10211" ...
## $ PFTFAC : chr "0.8967" "0.9072" "0.6" "0.6221" ...
## $ PCTPELL : chr "0.7356" "0.346" "0.6801" "0.3072" ...
## $ C150_4 : chr "0.3525" "0.5554" "0.2222" "0.4614" ...
## $ PFTFTUG1_EF: chr "0.8578" "0.5041" "0.5" "0.475" ...
## $ RET_FT4 : chr "0.6595" "0.8288" "0" "0.7696" ...
## $ PCTFLOAN : chr "0.8284" "0.5214" "0.7795" "0.4596" ...
## - attr(*, "problems")=Classes 'tbl_df', 'tbl' and 'data.frame': 421 obs. of 5 variables:
## ..$ row : int 7283 7284 7285 7286 7287 7288 7289 7290 7291 7292 ...
## ..$ col : chr "CCBASIC" "CCBASIC" "CCBASIC" "CCBASIC" ...
## ..$ expected: chr "an integer" "an integer" "an integer" "an integer" ...
## ..$ actual : chr "NULL" "NULL" "NULL" "NULL" ...
## ..$ file : chr "'../../CSVs/PreETL_CollegeScorecard.csv'" "'../../CSVs/PreETL_CollegeScorecard.csv'" "'../../CSVs/PreETL_CollegeScorecard.csv'" "'../../CSVs/PreETL_CollegeScorecard.csv'" ...
## - attr(*, "spec")=List of 2
## ..$ cols :List of 30
## .. ..$ UNITID : list()
## .. .. ..- attr(*, "class")= chr "collector_integer" "collector"
## .. ..$ INSTNM : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ CITY : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ STABBR : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ CONTROL : list()
## .. .. ..- attr(*, "class")= chr "collector_integer" "collector"
## .. ..$ CCBASIC : list()
## .. .. ..- attr(*, "class")= chr "collector_integer" "collector"
## .. ..$ ADM_RATE : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ SAT_AVG : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ UGDS : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ UGDS_WHITE : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ UGDS_BLACK : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ UGDS_HISP : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ UGDS_ASIAN : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ UGDS_AIAN : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ UGDS_NHPI : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ UGDS_2MOR : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ UGDS_NRA : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ UGDS_UNKN : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ PPTUG_EF : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ NPT4_PUB : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ NPT4_PRIV : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ COSTT4_A : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ TUITFTE : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ INEXPFTE : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ PFTFAC : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ PCTPELL : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ C150_4 : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ PFTFTUG1_EF: list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ RET_FT4 : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ PCTFLOAN : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## ..$ default: list()
## .. ..- attr(*, "class")= chr "collector_guess" "collector"
## ..- attr(*, "class")= chr "col_spec"
## Warning in `[<-.factor`(`*tmp*`, is.na(x), value = ""): invalid factor
## level, NA generated
## Warning in `[<-.factor`(`*tmp*`, is.na(x), value = ""): invalid factor
## level, NA generated
## Warning in `[<-.factor`(`*tmp*`, is.na(x), value = ""): invalid factor
## level, NA generated
## Warning in `[<-.factor`(`*tmp*`, is.na(x), value = ""): invalid factor
## level, NA generated
## Warning in `[<-.factor`(`*tmp*`, is.na(x), value = ""): invalid factor
## level, NA generated
## Warning in `[<-.factor`(`*tmp*`, is.na(x), value = ""): invalid factor
## level, NA generated
## [1] "ADM_RATE"
## [1] "SAT_AVG"
## Warning in `[<-.factor`(`*tmp*`, is.na(x), value = 0): invalid factor
## level, NA generated
## [1] "UGDS"
## [1] "UGDS_WHITE"
## [1] "UGDS_BLACK"
## [1] "UGDS_HISP"
## [1] "UGDS_ASIAN"
## [1] "UGDS_AIAN"
## [1] "UGDS_NHPI"
## [1] "UGDS_2MOR"
## [1] "UGDS_NRA"
## [1] "UGDS_UNKN"
## [1] "PPTUG_EF"
## [1] "NPT4_PUB"
## Warning in `[<-.factor`(`*tmp*`, is.na(x), value = 0): invalid factor
## level, NA generated
## [1] "NPT4_PRIV"
## Warning in `[<-.factor`(`*tmp*`, is.na(x), value = 0): invalid factor
## level, NA generated
## [1] "COSTT4_A"
## Warning in `[<-.factor`(`*tmp*`, is.na(x), value = 0): invalid factor
## level, NA generated
## [1] "TUITFTE"
## [1] "INEXPFTE"
## [1] "PFTFAC"
## [1] "PCTPELL"
## [1] "C150_4"
## [1] "PFTFTUG1_EF"
## Warning in `[<-.factor`(`*tmp*`, is.na(x), value = 0): invalid factor
## level, NA generated
## [1] "RET_FT4"
## [1] "PCTFLOAN"
## Classes 'tbl_df', 'tbl' and 'data.frame': 7703 obs. of 30 variables:
## $ UNITID : Factor w/ 7703 levels "100654","100663",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ INSTNM : Factor w/ 7535 levels "A and W Healthcare Educators",..: 95 6760 242 6761 99 6497 1136 391 408 407 ...
## $ CITY : Factor w/ 2542 levels "Aberdeen","Abilene",..: 1559 190 1432 1012 1432 2286 27 94 1432 98 ...
## $ STABBR : Factor w/ 59 levels "AK","AL","AR",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ CONTROL : Factor w/ 3 levels "1","2","3": 1 1 2 1 1 1 1 1 1 1 ...
## $ CCBASIC : Factor w/ 34 levels "-2","1","10",..: 11 8 14 9 12 9 2 16 11 9 ...
## $ ADM_RATE : num 0.526 0.856 NA 0.82 0.533 ...
## $ SAT_AVG : num 827 1107 NA 1210 851 ...
## $ UGDS : num 4206 11383 201 5451 4811 ...
## $ UGDS_WHITE : num 0.0333 0.5022 0.2 0.6088 0.0158 ...
## $ UGDS_BLACK : num 0.0353 0.26 0.4102 0.1255 0.0208 ...
## $ UGDS_HISP : num 0.0055 0.0283 0.006 0.0382 0.0121 0.0348 0.0044 0.0101 0.0074 0.0248 ...
## $ UGDS_ASIAN : num 0.001 0.0518 0.0034 0.0376 0.001 0.0106 0.0025 0.0053 0.0221 0.0227 ...
## $ UGDS_AIAN : num 0.0024 0.0022 0 0.0143 0.001 0.0038 0.0044 0.0157 0.0044 0.0074 ...
## $ UGDS_NHPI : num 0.001 0.0007 0 0.0002 0.0006 0 0 0.001 0.0016 0 ...
## $ UGDS_2MOR : num 0 0.0368 0 0.0172 0.0008 0.0261 0 0.0174 0.0207 0 ...
## $ UGDS_NRA : num 0.005 0.017 0 0.0332 0.0243 0.0268 0 0.0057 0.0307 0.01 ...
## $ UGDS_UNKN : num 0.0138 0.01 0.2715 0.035 0.0137 ...
## $ PPTUG_EF : num 0.0656 0.2607 0.4536 0.2146 0.0802 ...
## $ NPT4_PUB : num 15220 14780 NA 18506 11110 ...
## $ NPT4_PRIV : num NA NA 12002 NA NA ...
## $ COSTT4_A : num 21475 20621 16370 21107 18184 ...
## $ TUITFTE : num 427 800 12450 8056 7733 ...
## $ INEXPFTE : num 7437 17020 5532 10211 7618 ...
## $ PFTFAC : num 0.8067 0.0072 0.6 0.6221 0.653 ...
## $ PCTPELL : num 0.736 0.346 0.68 0.307 0.735 ...
## $ C150_4 : num 0.352 0.555 0.222 0.461 0.263 ...
## $ PFTFTUG1_EF: num 0.858 0.504 0.5 0.475 0.881 ...
## $ RET_FT4 : num 0.65 0.829 0 0.761 0.573 ...
## $ PCTFLOAN : num 0.828 0.521 0.77 0.451 0.755 ...
## - attr(*, "problems")=Classes 'tbl_df', 'tbl' and 'data.frame': 421 obs. of 5 variables:
## ..$ row : int 7283 7284 7285 7286 7287 7288 7289 7290 7291 7292 ...
## ..$ col : chr "CCBASIC" "CCBASIC" "CCBASIC" "CCBASIC" ...
## ..$ expected: chr "an integer" "an integer" "an integer" "an integer" ...
## ..$ actual : chr "NULL" "NULL" "NULL" "NULL" ...
## ..$ file : chr "'../../CSVs/PreETL_CollegeScorecard.csv'" "'../../CSVs/PreETL_CollegeScorecard.csv'" "'../../CSVs/PreETL_CollegeScorecard.csv'" "'../../CSVs/PreETL_CollegeScorecard.csv'" ...
## - attr(*, "spec")=List of 2
## ..$ cols :List of 30
## .. ..$ UNITID : list()
## .. .. ..- attr(*, "class")= chr "collector_integer" "collector"
## .. ..$ INSTNM : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ CITY : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ STABBR : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ CONTROL : list()
## .. .. ..- attr(*, "class")= chr "collector_integer" "collector"
## .. ..$ CCBASIC : list()
## .. .. ..- attr(*, "class")= chr "collector_integer" "collector"
## .. ..$ ADM_RATE : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ SAT_AVG : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ UGDS : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ UGDS_WHITE : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ UGDS_BLACK : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ UGDS_HISP : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ UGDS_ASIAN : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ UGDS_AIAN : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ UGDS_NHPI : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ UGDS_2MOR : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ UGDS_NRA : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ UGDS_UNKN : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ PPTUG_EF : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ NPT4_PUB : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ NPT4_PRIV : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ COSTT4_A : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ TUITFTE : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ INEXPFTE : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ PFTFAC : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ PCTPELL : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ C150_4 : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ PFTFTUG1_EF: list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ RET_FT4 : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ PCTFLOAN : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## ..$ default: list()
## .. ..- attr(*, "class")= chr "collector_guess" "collector"
## ..- attr(*, "class")= chr "col_spec"
## CREATE TABLE CSVsPreETL_CollegeScorecard (
## -- Change table_name to the table name you want.
## UNITID varchar2(4000),
## INSTNM varchar2(4000),
## CITY varchar2(4000),
## STABBR varchar2(4000),
## CONTROL varchar2(4000),
## CCBASIC varchar2(4000),
## ADM_RATE number(38,4),
## SAT_AVG number(38,4),
## UGDS number(38,4),
## UGDS_WHITE number(38,4),
## UGDS_BLACK number(38,4),
## UGDS_HISP number(38,4),
## UGDS_ASIAN number(38,4),
## UGDS_AIAN number(38,4),
## UGDS_NHPI number(38,4),
## UGDS_2MOR number(38,4),
## UGDS_NRA number(38,4),
## UGDS_UNKN number(38,4),
## PPTUG_EF number(38,4),
## NPT4_PUB number(38,4),
## NPT4_PRIV number(38,4),
## COSTT4_A number(38,4),
## TUITFTE number(38,4),
## INEXPFTE number(38,4),
## PFTFAC number(38,4),
## PCTPELL number(38,4),
## C150_4 number(38,4),
## PFTFTUG1_EF number(38,4),
## RET_FT4 number(38,4),
## PCTFLOAN number(38,4)
## );
Cleaned data can be downloaded from Data.world as a .csv file. Because the dataset is so large, we filtered to only show some rows.
Hosting User: jlee Database: S17 DV Final Project Dataset Name: CollegeScorecard.csv
Download Link: https://query.data.world/s/dv5dl8q1jx2qb3d3bd2976b9d
source("../01 Data/Accessdataworld.R")
## Loading required package: data.world
## Warning in utils::read.table(propsfile, header = FALSE, sep = "=",
## row.names = 1, : incomplete final line found by readTableHeader on 'C:/
## Users/Blanche/Documents/.data.world'
## Warning in utils::read.table(propsfile, header = FALSE, sep = "=",
## row.names = 1, : incomplete final line found by readTableHeader on 'C:/
## Users/Blanche/Documents/.data.world'
## UNITID INSTNM CITY STABBR
## Min. :100654 Length:10 Length:10 Length:10
## 1st Qu.:101989 Class :character Class :character Class :character
## Median :143487 Mode :character Mode :character Mode :character
## Mean :126860
## 3rd Qu.:143536
## Max. :143613
##
## CONTROL CCBASIC ADM_RATE SAT_AVG
## Min. :1.0 Min. :-2.0 Min. :0.5256 Min. : 827
## 1st Qu.:1.0 1st Qu.:-2.0 1st Qu.:0.5293 1st Qu.: 967
## Median :2.5 Median : 2.0 Median :0.5330 Median :1107
## Mean :2.1 Mean : 5.9 Mean :0.6382 Mean :1038
## 3rd Qu.:3.0 3rd Qu.:13.5 3rd Qu.:0.6945 3rd Qu.:1144
## Max. :3.0 Max. :21.0 Max. :0.8560 Max. :1181
## NA's :7 NA's :7
## UGDS UGDS_WHITE UGDS_BLACK UGDS_HISP
## Min. : 2 Min. :0.00000 Min. :0.01550 Min. :0.0055
## 1st Qu.: 55 1st Qu.:0.01995 1st Qu.:0.04167 1st Qu.:0.0346
## Median : 650 Median :0.33715 Median :0.11515 Median :0.0677
## Mean : 2373 Mean :0.31798 Mean :0.20395 Mean :0.1203
## 3rd Qu.: 3530 3rd Qu.:0.49350 3rd Qu.:0.25093 3rd Qu.:0.1918
## Max. :11383 Max. :0.77880 Max. :0.74740 Max. :0.3380
##
## UGDS_ASIAN UGDS_AIAN UGDS_NHPI UGDS_2MOR
## Min. :0.00000 Min. :0.00000 Min. :0.000000 Min. :0.000000
## 1st Qu.:0.00025 1st Qu.:0.00000 1st Qu.:0.000000 1st Qu.:0.001275
## Median :0.01215 Median :0.00230 Median :0.000350 Median :0.011350
## Mean :0.01906 Mean :0.00769 Mean :0.001820 Mean :0.013410
## 3rd Qu.:0.03320 3rd Qu.:0.00860 3rd Qu.:0.001225 3rd Qu.:0.019125
## Max. :0.05180 Max. :0.04510 Max. :0.013200 Max. :0.036800
##
## UGDS_NRA UGDS_UNKN PPTUG_EF NPT4_PUB
## Min. :0.00000 Min. :0.000000 Min. :0.0017 Min. : 1044
## 1st Qu.:0.00000 1st Qu.:0.001275 1st Qu.:0.1144 1st Qu.: 6268
## Median :0.00000 Median :0.010000 Median :0.3100 Median :11395
## Mean :0.00381 Mean :0.020510 Mean :0.3820 Mean : 9764
## 3rd Qu.:0.00375 3rd Qu.:0.026100 3rd Qu.:0.6693 3rd Qu.:14890
## Max. :0.01700 Max. :0.070200 Max. :0.8526 Max. :15220
## NA's :6
## NPT4_PRIV COSTT4_A TUITFTE INEXPFTE
## Min. : 4638 Min. : 8557 Min. : 423 Min. : 535
## 1st Qu.: 5808 1st Qu.:12818 1st Qu.: 1176 1st Qu.: 5636
## Median :10206 Median :20621 Median : 4249 Median : 6830
## Mean :11958 Mean :21727 Mean : 5054 Mean : 7262
## 3rd Qu.:17634 3rd Qu.:21475 3rd Qu.: 8224 3rd Qu.: 7529
## Max. :22246 Max. :45162 Max. :11878 Max. :17020
## NA's :4 NA's :5
## PFTFAC PCTPELL C150_4 PFTFTUG1_EF
## Min. :0.0072 Min. :0.0200 Min. :0.3525 Min. :0.0230
## 1st Qu.:0.2663 1st Qu.:0.3317 1st Qu.:0.4540 1st Qu.:0.1742
## Median :0.6410 Median :0.5780 Median :0.5554 Median :0.2027
## Mean :0.5442 Mean :0.4885 Mean :0.5167 Mean :0.3524
## 3rd Qu.:0.8067 3rd Qu.:0.7081 3rd Qu.:0.5988 3rd Qu.:0.5041
## Max. :1.0000 Max. :0.8065 Max. :0.6422 Max. :0.8578
## NA's :5 NA's :7 NA's :5
## RET_FT4 PCTFLOAN
## Min. :0.6505 Min. :0.0000
## 1st Qu.:0.7397 1st Qu.:0.1911
## Median :0.8288 Median :0.4432
## Mean :0.7721 Mean :0.4236
## 3rd Qu.:0.8329 3rd Qu.:0.6591
## Max. :0.8371 Max. :0.8284
## NA's :7
## State AreaName Population 18 Years or Over
## Length:52 Length:52 Min. : 441946
## Class :character Class :character 1st Qu.: 1350031
## Mode :character Mode :character Median : 3230085
## Mean : 4723276
## 3rd Qu.: 5328190
## Max. :29247121
## Population 25 Years or Over
## Min. : 384272
## 1st Qu.: 1169237
## Median : 2835094
## Mean : 4113026
## 3rd Qu.: 4638242
## Max. :25257858
## Population 25 Years or Over with No Schooling Completed
## Min. : 1049
## 1st Qu.: 11192
## Median : 32307
## Mean : 58463
## 3rd Qu.: 63731
## Max. :652517
## Population 25 Years or Over with Bachelor's Degree
## Min. : 65787
## 1st Qu.: 200538
## Median : 464927
## Mean : 761284
## 3rd Qu.:1002277
## Max. :5002596
## Population 25 Years or Over with Master's Degree
## Min. : 23501
## 1st Qu.: 84904
## Median : 199057
## Mean : 325993
## 3rd Qu.: 470375
## Max. :1965005
## Population 25 Years or Over with Doctorate's Degree
## Min. : 4932
## 1st Qu.: 21499
## Median : 48006
## Mean : 80593
## 3rd Qu.:103566
## Max. :583790
## # A tibble: 6 × 30
## UNITID INSTNM CITY STABBR CONTROL
## <int> <chr> <chr> <chr> <int>
## 1 100654 Alabama A and M University Normal AL 1
## 2 100663 University of Alabama at Birmingham Birmingham AL 1
## 3 100937 Birmingham Southern College Birmingham AL 2
## 4 105145 GateWay Community College Phoenix AZ 1
## 5 143482 Cannella School of Hair Design-Chicago Chicago IL 3
## 6 143491 Cannella School of Hair Design-Chicago Chicago IL 3
## # ... with 25 more variables: CCBASIC <int>, ADM_RATE <dbl>,
## # SAT_AVG <int>, UGDS <int>, UGDS_WHITE <dbl>, UGDS_BLACK <dbl>,
## # UGDS_HISP <dbl>, UGDS_ASIAN <dbl>, UGDS_AIAN <dbl>, UGDS_NHPI <dbl>,
## # UGDS_2MOR <dbl>, UGDS_NRA <dbl>, UGDS_UNKN <dbl>, PPTUG_EF <dbl>,
## # NPT4_PUB <int>, NPT4_PRIV <int>, COSTT4_A <int>, TUITFTE <int>,
## # INEXPFTE <int>, PFTFAC <dbl>, PCTPELL <dbl>, C150_4 <dbl>,
## # PFTFTUG1_EF <dbl>, RET_FT4 <dbl>, PCTFLOAN <dbl>
## # A tibble: 6 × 8
## State AreaName `Population 18 Years or Over`
## <chr> <chr> <int>
## 1 AL Alabama 3718646
## 2 AK Alaska 545248
## 3 LA Louisiana 3510657
## 4 ME Maine 1066847
## 5 AZ Arizona 5024639
## 6 MD Maryland 4582258
## # ... with 5 more variables: `Population 25 Years or Over` <int>,
## # `Population 25 Years or Over with No Schooling Completed` <int>,
## # `Population 25 Years or Over with Bachelor's Degree` <int>,
## # `Population 25 Years or Over with Master's Degree` <int>, `Population
## # 25 Years or Over with Doctorate's Degree` <int>
Descriptions: Refer to visualization captions for individual descriptions. Dataset Column Names: INSTNM - Institution Name; STABBR - State; CONTROL - 1 = Public. 2 = Private nonprofit. 3 = Private for-profit Boxplot: Average Cost of Attendance for Type of School
These boxplots (Tableau left, Shiny right) demonstrate the average cost of attendance per type of school.
Histogram: SAT Averages for Universities
These histograms (Tableau left, Shiny right) demonstrate the SAT averages for universities across the United States.
Scatterplot: Instructional Expenditures vs. Net tuition
These scatterplots (Tableau left, Shiny right) explore the correlation between Instructional expenditures per full-time equivalent student and Net tuition revenue per full-time equivalent student.
Crosstab 1: Instructional Expenditures / Cost of Attendance
These crosstabs (Tableau left, Shiny right) demonstrate the ratio of instructional expenses to the average cost of attendance. They are labeled by the average cost of attendance. The red tile indicates a high ratio. The green tile indicates a medium ratio, and the blue tile indicates a low ratio. From the crosstab, one can see that public schools usually have a higher ratio, while private non-profit schools usually have a medium ratio. Private schools mostly have a medium to low ratio with the exception of some high ratios in four states.
Crosstab 2: Tuition Revenue / Total Cost
These crosstabs (Tableau left, Shiny right) demonstratae the ratio of the net tuition revenue per full-time student to the average cost of attendance. The red tile indicates a high ratio. The green tile indicates a medium ratio, and the blue tile indicates a low ratio. From the crosstab, one can see that public schools usually a medium ratio, while private non-profit schools usally have a medium to high ratio. Private schools mostly have a high ratio with the exception of some low and medium ratios some states.
Map 1: Region Cost of Attendance (Instructional Expenditures / Cost of Attendance)
These maps (Tableau left, Shiny right) demonstrate the distribution of instructional expenditure / cost of attendance ratio across the United States, where darker colors indicate higher ratios.
Map 2: Tuition Revenue to Total Cost
These maps (Tableau left, Shiny right) demonstrate the distribution of tuition revenue / total cost ratio across the United States, where darker colors indicate higher ratios.
Barchart: Instructional Expense per Type of Instutition
This barchart + table calculations (Tableau left, Shiny right) display the sum of instructional expenses across each control (public, private non-profit, and private for profit) for each state. The line shows the average of the sum of instructional expenses. This ID Sets on a map for barcharts has two sets: High Net Price and Medium Net Price for public schools. Net price is the actual amount families pay on average. The dots represent schools in the High Net Price.
Description: Full size static .pngs of the Shiny application, as well as a link to the live published version.
Published Link: https://ehjkim.shinyapps.io/shinyfinal/
Boxplot: Average Cost of Attendance for Type of School
Histogram: SAT Averages for Universities
Scatterplot: Instructional Expenditures vs. Net tuition
Crosstab 1: Instructional Expenditures/Cost of Attendance
Crosstab 2: Tuition Revenue / Total Cost
Map 1: Region Cost of Attendance (Instructional Expenditures / Cost of Attendance)
Map 2: Tuition Revenue to Total Cost
Barchart: Instructional Expense per Type of Instutition
Descriptions: Full size static .pngs of the tableau visualizations. Refer to visualization captions for individual descriptions.
Boxplot: Average Cost of Attendance for Type of School
Histogram: SAT Averages for Universities
Scatterplot: Instructional Expenditures vs. Net tuition
Crosstab 1: Instructional Expenditures/Cost of Attendance
Crosstab 2: Tuition Revenue / Total Cost
Map 1: Region Cost of Attendance (Instructional Expenditures / Cost of Attendance)
Map 2: Tuition Revenue to Total Cost
Barchart: Instructional Expense per Type of Instutition